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Abstract. Let T be a (not necessarily positive) weighted tree with n leaves numbered by the set {1, ...,n}. 
Dchne the k- weights of the tree i k (T) as the sum of the lengths of the edges of the minimal subtree 

connecting ix,.--,ik- We will call such numbers "fc-weights" of the tree. In this paper, we characterize sets 
of real numbers indexed by the subsets of any cardinality > 2 of a n-set to be the weights of a tree with n 
leaves. 



1 Introduction 

Consider a positive-weighted tree T (that is a tree such that every edge is endowed with a positive 
real number, which we call the length of the edge) with n leaves numbered by the set {1, n}. Let 
Di j (T) be the sum of the lengths of the edges of the shortest path connecting i and j for any i and 
j leaves of T. We call such number the "double weight" for i and j. 

In 1971 Buneman characterized the metrics on finite sets which are the double weights of a positive- 
weighted tree: 

Theorem 1 (Buneman) A metric {Dij) on {l,...,n} is the metric induced by a positive-weighted 
tree if and only if for all k,h 6 {1, n} the maximum of {Dij + D^ h, D^ + Djh, Di^ + D^A 
is attained at least twice. 

The problem of reconstructing trees from data involving the distances between the leaves has several 
applications, such as phylogenetics: evolution of species can be represented by trees and, given 
distances between genetic sequences of some species, one can try to reconstruct the evolution tree 
from these distances. Some algorithms to reconstruct trees from the data {Dij} have been proposed. 
Among them is neighbour-joining method, invented by Saitou and Nei in 1987 (see [NS| . [SKj and 

EEED). 

For any weighted tree T with leaves l,...,n and for any distinct ii,...,ik € {l,...,n}, define 

Di 1) .... ) i k (T) as the sum of the lengths of the edges of the minimal subtree connecting ii , . We 

call such numbers fc-weights of the tree T and the vector of the /c-weights is called /c-dissimilarity 
vector. 

In 2004, Pachter and Speyer proved the following theorem (see |PS| ). 
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Theorem 2 (Pachter-Speyer) . Let k, n G N with n>2k — l and k > 3. A positive-weighted tree 
T with n leaves 1, ...,n and no vertices of degree 2 is determined by the values Df where I varies in 
the k-subsets of {1, n}. 

It can be interesting to characterize the sets of real numbers which are sets of fc-weights of a tree. 
In [Iri] . Iriarte proves that ^-dissimilarity vectors of positive-weighted trees are contained in the 
tropical Grassmannian. See also |CoJ and [ManJ. 

Less results are known about not necessarily positive weighted trees. Observe that also in this 
case, the problem of reconstructing the weighted trees may have some applications: imagine that a 
particle, by going through an edge of a tree, gets or looses some substance (as much as the weight 
of the edge). If we know how much the substance of this particle varies by going from a leaf i of 
the tree to another leaf j (the value Dij) for any i and j, we can try to reconstruct the weighted 
tree (which can repesents a tree in the human body, a hydraulic web...). Analogously the numbers 
-Oii,...,i fe can represent how much a material, by going from i s to i±, i s , gets or looses of 
a certain substance. It can be interesting, given a set {Di lr .. ) i k }i lr .. ) i k , to wonder if there exists a 
weighted tree with these /c-weights. 

In [Ru] we gave a characterization for sets indexed by 2-subsets (or 3-subsets) of a n-set to be 
double (resp. triple) weights of a tree with n leaves (with not necessarily positive weights) and, by 
using these ideas, we proposed a slight modification of Saitou-Nei's Neighbour-Joining algorithm to 
reconstruct trees from the data Dij. 

Here we characterize sets of real numbers indexed by the subsets of any cardinality > 2 of a n-set 
to be the weights of a tree (Theorem HOD : besides we extend the definition of Z)^ j fc (T) to the 
case ii, ....,ik not distinct and we find necessary and sufficient conditions for a set of real numbers 
indexed by the submultisets of an n-set to be the set of the weights of a tree with n leaves (Theorem 
[9]) and necessary and sufficient conditions for a set of real numbers indexed by the /c-submultisets 
of an n-set to be the set of the /c-weights of a tree with n leaves (Theorem ITTT) . 

2 Some notation 

Definition 3 A cherry B in a tree T is a subtree such that only one of the inner vertices is not 
bivalent; we call this vertex "stalk" of the cherry and we say that the leaves of B are neighbours. 
We call "twig " of a leaf of a cherry the path from the leaf to the stalk of the cherry. 
A complete cherry is a cherry such that there doesn't exist another cherry strictly containg it. 



B 




Notation 4 • For every n G N, let [n] = {1, ....,n}. 
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• If M is a set, denote the set of the k-subsets (without repetitions) of M by M k and the set of the 
"k-submultisets" (i.e. with possible repetitions) of M by M k . 

• For any set {D^ ^} of real numbers indexed by elements {i\, ■■■■,ik} € [n] k or [n]k, we denote 
D {h,...,i k } h D h,-,i k f or an V order ofh,....,i k . 

• A weighted tree is a tree such that every edge is endowed with a real number called weight or 
length of the edge. If the weights are positive we say that the tree is positive-weighted. Please 
note that in other papers "weighted" means positive-weighted. 

For x,y vertices of a tree T, we denote by d(x,y) (Intrinsic distance,) the number of the edges of 
the path from x to y. 

For any leaf x of T and any subtree E, we define N(x, E) as the at least trivalent vertex in E with 
minimum intrinsic distance from x. 

Example. 




X 



Now let T be weighted and let [n] be the set of its leaves. For x, y vertices ofT, we denote by w(x, y) 
(w-distancej the sum of the weights of the edges of the path from x to y (obviously it is not a 
distance). 

For any distinct i±, ...,i k G [n], we define ^^...^(T) as the sum of the lengths of the edges of the 
minimal subtree connecting i\,....,i k . Besides ifii,-.,ik are not distinct, we define Di ltmmmt i k (T) by 
induction on the number of the repetition in the following way: 

D x , x ,z(T) = D XtZ (T) + w(x, N(x, T)) 

We call the numbers ^^...^(T) k- weights of T. 

Example. 




Definition 5 For any set {Di}i € s of real numbers parametrized by S C A4 := {I submultisubset of[n]} 
and for any e,e' £ [n], we define * e,e the following condition: 
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D e> x — D e ' x doesn't depend on X s.t. (e, X), {e! \X) G S 

and we say that a = {a±, a r } C [n] is a pseudocherry for {Dj}j e s if * ai,aj holds for all i,j. 
We say that a is a complete pseudocherry if /Q'j G [n] — a such that holds for all i. 

Proposition 6 [Rul. Let T be a positive- weighted tree with leaves l,....,n with n > 2k — 1. Let 
e,e' G [n]. Then * e ' e holds for {Di(T)} Ie < n jh if and only if {e,e'} is a cherry, that is {e,e'} is a a 
pseudocherry if and only if it is a cherry. 



3 The case of the trees with four leaves 

Lemma 7 Let k G N, k > 2. Let {-C/}{/}6[4] fe be a set of real numbers. Lt is the set of k-weights 
of a tree T with 1,2,3,4 as leaves and {1,2} and {3,4} as cherries if and only if the following 
conditions hold: 

A) {1,2} and {3,4} are pseudocherries and for any i G {1,2}, j G {3,4} 

D i)X ~ D jjX 

is the same for X varying in the subsets of [4] intersecting both {1,2} and {3, 4}, it is the same for 

X varying in the subsets of {1,2}, it is the same for X varying in the subsets of {3, 4}. 

B) 

—L>i : z + Dj t z — L)iy + Dj t y = 2{Dj t w — Di : w) 
for i G {1, 2} and j G {3, 4}, for Z C {1, 2}, Y C {3, 4}, W intersecting both {1, 2} and {3, 4}. 

Proof. Easy. 

<= We will sometimes denote D\,... ,1,2,. ..,2,3,. ..,3,4,. ..,4, with 1 repeated k\ times, 2 repeated k% times, 
3 repeated k% times, 4 repeated k^ times, by 2 fc 2 3*3 4 fc 4 • 

Remark. If there exists a tree T without bivalent vertices, with [4] as set of leaves and {1,2} and 
{3, 4} as cherries, by calling the weights of the edges as in the figure below, we have: 
if i and j are in the same cherry, then a% — a,j = D%.x ~ ^j,X f or an D X G KU_i 
if i and j are not in the same cherry, then: 

a, — cij = Di t x ~ Dj x for any X G [4]&_i intersecting both cherries and 
ai + f — aj = Dix — Dj t x for any X G [4]fc_x in the same cherry as j. 




We define a tree as in the figure above with a\, 02, 03, 04, / defined in the following way. 
We define a\ , a 2 as the solution of the following linear system (for any k% , ki with k\ + &2 = k and 
any X G [4] fc _i): 
( ai - a 2 = D ljX - D 2 ,x 
I k\a\ + k 2 a 2 = D^j^ ^ ^o 40 
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Obviously it admits only one solution and one can easily see that it does not depend on X since 
* 1>2 holds; besides it doesn't depend on ki,k 2 : in fact the system 
ai -a 2 = A,x - D 2 ,x 

k\a\ + k 2 (l 2 = A. fc l 2 k 2 30 40 
t\a\ + t 2 a 2 = Dpi ,2*2 ,3° ,4° 

is compatible for any ti,t 2 with t\ +t 2 = k; to see this, it is sufficient to see that 
ai-a 2 = A,x - D 2;X 
k\a\ + k 2 a 2 = D 1 k 1 2*=2 30 40 
(hi - l)ai + (k 2 + l)a 2 = £ , 1 fc 1 -i j2 fc 2 +i j3 o )4 o 
is compatible and this follows from * 1,2 . 
We define: 

/ = A,y - A',y - A,w + 

for any i, j with i G {1, 2}, j G {3, 4} and for intersecting both {1, 2} and {3, 4} and Y in {3, 4}. 
One can easily see that it is a good definition, that is it does not depend on W, Y. Equivalently 
(by B) we can define 

/ = ~A,z + Dj t z + A,w - A'.VK 

for any i,j with i € {1, 2}, j G {3, 4} and for intersecting both {1, 2} and {3, 4} and Z in {1, 2}. 
For j G {3, 4} we define 

dj =ai + (Dj^ w - A,w) 
for intersecting both {1,2} and {3,4}. It is a good definition by A. 
We shall show now that for such a tree T we have 

A fc i,2 fc 2,3 fc 3,4 fc 4 (-0 = A fc l,2 fc 2,3 fc 3,4 fc 4 

• Let us first suppose that fci + &2 > 0. We argue by induction on k^ + k^. 
If ks + ki = it is obvious by the definition of ai and 02: 

■^1*1 2 fc 2 3° 4° v-* / ~~ Kiai + K 2 a 2 — i-'i^i ,2*2,30,40 
If /C3 + /C4 = 1, we can suppose for instance that £3 = 1 and k$ = 0. 

D l k i,2 k 2,3i,4o(T) = ,30,40 i T ) + / + a 3 - a l = 

by induct, assumpt. and 2 nd def. of f and a$ _ 

= -^l^i+i, 2*2,30,40 + Al fc l ,2 fc 2 3 1 4° ~~ -^l fc l+ 1 ,2 fe 2 ,3°,4 = A fe l ,2*2,31,40 

if h + k A - 1 > 0, 

Afcl,2 fc 2,3 fc 3,4 fc 4C0 = -Difci + i,2fe 2 ,3 fc 3- 1 ,4 fc 4(^) + «3 - Ol = 

fey induct, assumpt. and def. 0/03 

= A fc i + i,2 fc 2,3 fe 3- 1 ,4 fe 4+A fe l,2 fc 2,3 fe 3,4 fe 4 ~~ -^l fc i + 1 ,2 fc 2,3 fc 3- 1 ,4 fe 4 = A fc i,2 fc 2,3 fc 3,4 fc 4 

• Suppose now that k\ + k 2 = 0. 

A ,2°,3 fe 3,4 fe 4 (T) = -Dll,20,3fc3-!,4 fc 4 (^) - / — Ol + «3 = 

fej/ previous case and 1 st def. of f and 03 „ in n n 

= ^l 1 ^"^- 1 ,^ + A°,2°, 3 fe 3,4 fe 4 ~ A 1 , 20,3 fc 3- 1 ,4 fc 4 =A°,2°, 3 fe 3,4 fe 4 

□ 
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Lemma 8 Let {#/}{,rc[4] | cardinaiity(i)>2} be a set of real numbers. It is the set of weights of a 
tree T with 1,2,3,4 as leaves and {1,2} and {3,4} as cherries if and only i/{l,2} and {3,4} are 
pseudocherries and 
A) 

—Di,z + Dj,z — Di,Y + Dj,Y = 2{Dj,w — A,iy) 

for i £ {1, 2} and j G {3, 4}, for Z C {1, 2}, Y C {3, 4}, W intersecting both {1, 2} and {3, 4}. 

B) if we define aiforanyi € {1,2} by Oj = ^(Dij+Di^x — Dj,x) for any X C [n] and j € {1,2} — {i}, 

then, for {i,j} = {1,2} and any 5 C [n], 

#j,j,<5 = <H + #j,<5 

Sketch of the proof. =4> Easy to prove. 

•<= Construct a tree T as in the proof of Lemma [7] from the Dj with cardinality {I) = 2. We have 
to prove that Dj(T) = Dj for any / C [n] with cardinality (I) = 3,4. 

Dl,2,3(T) = Oi + L> 2j3 (T) = Ol + ^2,3 = #1,2,3 

Analougously for -Di,2,4- 

1 1 

£>2,3,4(T) = L» 1 , 2i 4(T) + -(- J D li2 (r)+D3,2(T)-L>l,4(r)+ J D3,4(T)) = L>l,2,4+2(-L>l,2+L>3,2-L>l,4+L>3,4) = #2,3,4 

Analogously #1,3,4- 

#1,2,3,4(7") = ai+ #2,3,4(7) = 01 + #2,3,4 = #1,2,3,4 

□ 

4 Characterization of the set of dissimilarity vectors 

In this section our first aim is to characterize the sets of real numbers indexed by subsets or 
submultisets of [n] which come from a tree. We characterize also the sets of real numbers indexed 
by the elements of [n]k for k fixed. Shortly speaking in [Ru] we proved that for k = 2 such a set 
comes from a tree if and only if in [n] there are at least two pseudocherries and if we substitute 
every pseudocherry with a point, the same condition holds for the new set and so on. Obviously for 
higher k the situation is more complicated. 

Theorem 9 Let n > 4. Let {-D/}{/ g [ ra ] fe f or some fc> 2 } be a set of real numbers. It is the set of the 

weights of a tree T with leaves 1, ...,n if and only if there exist a, (3 C [n] such that: 

1) a and j3 are disjoint complete pseudocherries and for any cti £ a, f3j G (3 the number 

#« l ,x - Dp j>x 

is the same for X varying in the submultisets of [n] intersecting both a and (3, it is the same for X 

varying in the submultisets of a, it is the same for X varying in the submultisets of f3 

2) 

—DoLi,z + Dp h z — #Qi,y + #/3j,y = ^(Dp^w - D at ,w) 
for cti £ ol, /3j 6 P, Z C a, Y C /3 and W intersecting both a and (3 
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3) if, for any ai € a, we define a ai by a ai = ^(D aii0l . + D aij x — D a .x) for any X submultiset of 
[n] and ay E a, then, for any a\, at € a and 5%, S s G [n], 

I-^ai,...,at,5i,...,S s a ai + ••• + "I" -^at,Si,...,5 s 

(i.e. 2(D ait ,„ }atiSl ,...,s a + ... + D at _ 2 + D ai! x-D at _ uX MX C [n]) 

4) if we define M = [n] — a U {a} and 



D, 



D 



(for any ai & a), then the same conditions hold for M. 

Proof. => Easy to prove; for instance observe that, for X subset of a, D au x — Dp. x is the sum of 
the weights of the edges of the twig of «j , minus the sum of the weights of the edges of the twig of 
(3j, minus the w-distance between the stalks of a and (3. 





First observe that the definition of a ai is equivalent to the formula 

I a ai — a aj = D ai X — D aj ,X 

I dcti Ojcij — Deti,ctj 

for any X submultiset of [n] and it is a good definition, that is it doesn't depend on X neither on ay 
(because a is a pseudocherry) . Besides, obviously, also the definition of Da,i u ...,i k doesn't depend 
on ctj, because a is a pseudocherry. 

We can prove the statement by induction on n. The case n = 4 follows from Lemma [7] (observe that 
conditions 1 and 2 of the theorem imply conditions A and B of the lemma and that the definitions 
of ai and / in the proof of the lemma don't depend on k, so the tree we construct is the same for 
every k). 

Let us prove the induction step. By induction assumption there exists a tree R such that 

Di(R) = Dj VI £ M k fork>2 

We define the tree T by attaching to R a cherry a with lengths of the twigs a ai to the point a. We 

must show that i}ai,...,at,5i,— .,*»0O = ^ai,...,at,Si,....,S 3 f° r anv a i-> •••■> a t 6 ol, 6i, 5 S G [n] — a. We 
prove this on induction on t. 
t = is obvious 



t = 1 



, .def.ofT , . ind. ass. , „ def. of D a i[ i, 

JJ a j ,6 1 ,....,8 s {l ) = a aj + L>a,5 1 ,....,& s \tt) = a aj + L>a,6 1 ,....,8 s = 



Q-ctj "I" D a . )( 5 1 Q-Oj — Da, ,S\ ,....,5 S 
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Induction step: 



n ( r n\ de f- °f T i i in ( t>\ ind - ass - 

J- > a 1 ,...,a t ,Su-;Ss( I ) = dai +■■■■ + a at + ^a,S 1 ,-;SsK K ) = 



def. of D^ Sl ,....,5 s 

a ai + — + a a t + J-'a,8i,....,8 s — a ai + — + a a± + ^ai,<5i,....,<5 s ~~ a «i 

3 

= a a2 + .... + a at + D ai ^ lt s g = i?a lj ... J at,5i,....,(J B 



□ 



With a completely analogous proof, we can prove the following statement for a set of real numbers 
indexed by subsets of [n] (without repetitions): 

Theorem 10 Let n > 4. Let {Di}{ic[n] \ cardinality(l)>2} be a set of real numbers. It is the set of the 
weights of a tree T with leaves 1, ...,n if and only if there exist two disjoint complete pseudocherry 
a and (3 in [n] such that 

-D ai ,z + Dp hZ - D auY + Dp.^y = 2{Dp. jW - D auW ) 
for cti G a, (3j G (3, Z C a, Y C (3 and W intersecting both a and (3. 

2) if, for any cti G a, we define a ai by a a% = \{D auCtj + D a ^ x - D ajjX ) for any X C [n] and any 
aj G a, then, for any ai, ...,a t G a and Si, G [n], 

D ai ,...,at,Si,—,Ss = a ai + ••• + a a t -i + ^at,5i,...,5 s 

(i.e. 2(D aij _ jatj g lj _ j s s — D atj g lj _^g s ) = D aia2 

3) if we define M = [n] — a U {a} and 

-Oo,il,...,j fe = ^ai,h,...,i k ~ a ai 

(for any ai G a), then the same conditions hold for M. 

Finally we consider the case of set of real numbers indexed by fc-submultiset of [n] (k fixed). 

Theorem 11 Let n > 4 and k G N. Let {-Dr}{/} 6 [„] fe be a set of real numbers. It is the set of the 
k-weights of a tree T with leaves 1, n if and only if there exist a, (3 C [n] such that: 
1) a and (3 are disjoint complete pseudocherries and for any ai G a, f3j G (3 the number 

D ai ,x ~ Dp jtX 

is the same for X varying in the subsets of [n] intersecting both a and (3, it is the same for X 
varying in the subsets of a, it is the same for X varying in the subsets of (3. 

-D a ,,z + Dp^z — D au Y + Dp^y = 2{L>Pj,w - D au w) 
for ai G a, f3j G (3, Z C a, Y C (3 and W intersecting both a and (3. 

3) if we define a ai for any ai G a by a a% = l(kjD a _ k . kj + D auX - D aj>x ) for any X C [n], any 
aj G a, any ki, kj G N with ki + kj = k, and analogously ap., then 



- a l3j = D auAjD + Dp j>B>D - D S:A>D - D 5> b,d 
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for any A C a, B C /3, D B 5. 

4) If we define M = [n] — a U {a} and 



D, 



ai,ii,...,« fc _i 



(for any cti £ a), then the same conditions hold for M. 

(X, Y, Z, W, A, B, D of size such that the size of all the indices is k.) 

Proof. => Easy to prove. 

As in the proof of Theorem [TUl the definition of a ai (which is equivalent to the formula 



for any X C [n] and any ki,kj 6 N with ki + kj = k) and the definition of ^ ... %, are good 
definitions. 

We can prove the statement by induction on n. The case n = 4 follows from Lemma [71 Let us prove 
the induction step. By induction assumption and condition 4, there exists a tree R such that 



We define the tree T by attaching to R a cherry a with lengths of the twigs a ai to the point a. We 
must show that for any ai, at £ a, 5%, 5k-t G [n] — a 



We prove this on induction on t. The case t = is obvious and the case t = 1 is similar to the 
analogous case in the proof of Theorem [TUJ As to the induction step, suppose first k — t > 1 

D ai ,...,a u Si,-;Sk-t( T ) def = T f« 2 ,.., at A ) ...,4_ l) 4_ l (T) + a ai - w(5 k -t, N(8 k - t ,T)) = 

= A*2,...,af t ,<5i,....,<5 fc _ t A-t + a ai ~ a Pj + a Pj ~ w(Sk-t,N(5 k -t,T)) = 

= D a3) ... )0ttj s 1 ,....,s k _ t ,6 k _ t + a ai - ap. + ap. - Dp hD}B (R) + D s ,d,b( r ) 

for any B C /3 and D 3 5. If we take A = (ct2, at), D = (S\, 6k— t) m condition 3, we get that 
the number above is equal to D 0llr .. )0ltj s lr ... ) 5 k _ t . 

Suppose now that k — t = 0. We have to prove that D ai> ___ >ak (T) = D aij ___ j(Xk . We can write it as 



with ai,....,a r distinct and s\ + .... + s r = k. We can prove it by induction on S3 + .... + s r . 
If S3 + .... + s r = 0, the statement is obviuos by the definitions of the a ai and T: 




Dj{R) =Di V/ G M k 




If S3 + .... + s r > 0, we can suppose for instance that s r > 0: 



'c^ ,...,<*?- ( T ) = D o? +1 ,...,ap- 1 



(T) + a ar - a ai = 




,s r - 



1 + Da r ,x — D ait x 



for any X. By taking X = (a* 1 , a s r 



) we get D : 



a 



1 > 



□ 
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